Feed forward Neural Networks (MLP)


Lets load the MNIST dataset first


In [1]:
# Necessary imports
import time
from IPython import display

import numpy as np
from matplotlib.pyplot import imshow
from PIL import Image, ImageOps
import tensorflow as tf

%matplotlib inline

from tensorflow.examples.tutorials.mnist import input_data

# Read the mnist dataset
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)


Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz

Our feed forward neural network will look very similar to our softmax classifier. However, now we have multiple layers and non-linear activations over logits!

For this network, we will have 3 layers. 2 hidden layers and 1 output layer. The output layer is a softmax classifier that we implemented in the previous example.

Like in the previous example, let's first define our hyperparameters which now include layer sizes
Apparently layer sizes are usually in powers of 2 for computational reasons


In [2]:
# Hyperparameters (these are similar to the ones used in the previous example)
learning_rate = 0.5
training_epochs = 5
batch_size = 100

# Additional hyperparameters for our Neural Nets - Layer sizes
layer_1_size = 256
layer_2_size = 128
  • Step 1: Create placeholders to hold the images.

In [3]:
# Create placeholders
x = tf.placeholder(tf.float32, shape=(None, 784))
y = tf.placeholder(tf.float32, shape=(None, 10))
  • Step 2: Create variables to hold the weight matrices and the bias vectors for all the layers

    Note that the weights are now initialized with small random numbers. From Karpathy:

    "A reasonable-sounding idea then might be to set all the initial weights to zero, which we expect to be the “best guess” in expectation. This turns out to be a mistake, because if every neuron in the network computes the same output, then they will also all compute the same gradients during backpropagation and undergo the exact same parameter updates. In other words, there is no source of asymmetry between neurons if their weights are initialized to be the same."

    "the implementation for one weight matrix might look like W = 0.01* np.random.randn(), where randn samples from a zero mean, unit standard deviation gaussian."

    Suggested rule of thumb

    "Initialize the weights by drawing them from a gaussian distribution with standard deviation of sqrt(2/n), where n is the number of inputs to the neuron. E.g. in numpy: W = np.random.randn(n) * sqrt(2.0/n)."


In [4]:
# Model parameters that have to be learned

# Note that the weights & biases are now initialized to small random numbers
# Also note that the number of columns for should be the size of the first layer!
W_h1 = tf.Variable(0.01 * tf.random_normal([784, layer_1_size]))
b_h1 = tf.Variable(tf.random_normal([layer_1_size]))

# Layer 2
# The input dimensions are not 784 anymore but the size of the first layer. 
# The number of columns are the size of the second layer
W_h2 = tf.Variable(0.01 * tf.random_normal([layer_1_size, layer_2_size]))
b_h2 = tf.Variable(tf.random_normal([layer_2_size]))

# Output layer - Layer 3
# This is the softmax layer that we implemented earlier
# The input dimension size is now the size of the 2nd layer and the number of columns = number of classes
W_o = tf.Variable(0.01 * tf.random_normal([layer_2_size, 10]))
b_o = tf.Variable(tf.random_normal([10]))
  • Step 3: Lets build the flow of data. Each unit in each layer computes the logits (Linear function = W * X + b). Next, it applies an activation function over each of the weighted sum and passes them on as inputs to the next layer.
  • Step 4: Compute the loss function as the cross entropy between the predicted distribution of the labels from the output layer and its true distribution.
    Note that we will now simply use the tensor flow's softmax compute entropy function

In [5]:
# Get the weighted sum for the first layer
preact_h1 = tf.matmul(x, W_h1) + b_h1
# Compute the activations which forms the output of this layer
out_h1 = tf.sigmoid(preact_h1)
# out_h1 = tf.nn.relu(preact_h1)

# Get the weighted sum for the second layer
# Note that the input is now the output from the previous layer
preact_h2 = tf.matmul(out_h1, W_h2) + b_h2
# Compute the activations which forms the output of this layer
out_h2 = tf.sigmoid(preact_h2)
# out_h2 = tf.nn.relu(preact_h2)

# Get the logits for the softmax output layer
logits_o = tf.matmul(out_h2, W_o) + b_o

# Final layer doesn't have activations. Simply compute the cross entropy loss
cross_entropy_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=logits_o))
  • Step 5: Lets create an optimizer to minimize the cross entropy loss

In [6]:
# Create an optimizer with the learning rate
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
# optimizer = tf.train.AdamOptimizer(learning_rate)

# Use the optimizer to minimize the loss
train_step = optimizer.minimize(cross_entropy_loss)
  • Step 6: Lets compute the accuracy

In [7]:
# First create the correct prediction by taking the maximum value from the prediction class
# and checking it with the actual class. The result is a boolean column vector
correct_predictions = tf.equal(tf.argmax(logits_o, 1), tf.argmax(y, 1))
# Calculate the accuracy over all the images
# Cast the boolean vector into float (1s & 0s) and then compute the average. 
accuracy = tf.reduce_mean(tf.cast(correct_predictions, tf.float32))

Now lets run our graph as usual


In [8]:
# Initializing global variables
init = tf.global_variables_initializer()

# Create a saver to save our model
saver = tf.train.Saver()

# Create a session to run the graph
with tf.Session() as sess:
    # Run initialization
    sess.run(init)

    # For the set number of epochs
    for epoch in range(training_epochs):
        
        # Compute the total number of batches
        num_batches = int(mnist.train.num_examples/batch_size)
        
        # Iterate over all the examples (1 epoch)
        for batch in range(num_batches):
            
            # Get a batch of examples
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)

            # Now run the session 
            curr_loss, cur_accuracy, _ = sess.run([cross_entropy_loss, accuracy, train_step], 
                                                    feed_dict={x: batch_xs, y: batch_ys})
            
            if batch % 50 == 0:
                display.clear_output(wait=True)
                time.sleep(0.05)
                # Print the loss
                print("Epoch: %d/%d. Batch: %d/%d. Current loss: %.5f. Train Accuracy: %.2f"
                      %(epoch, training_epochs, batch, num_batches, curr_loss, cur_accuracy))
            
    # Run the session to compute the value and print it
    test_accuracy = sess.run(accuracy,
                                       feed_dict={x: mnist.test.images, 
                                                  y: mnist.test.labels})
    print("Test Accuracy: %.2f"%test_accuracy)
    
    # Lets save the entire session
    saver.save(sess, '../models/ff_nn.model')


Epoch: 4/5. Batch: 500/550. Current loss: 0.19177. Train Accuracy: 0.93
Test Accuracy: 0.93

In [9]:
# Load the model back and test its accuracy
with tf.Session() as sess:
    saver.restore(sess, '../models/ff_nn.model')
    test_accuracy = sess.run(accuracy,
                                       feed_dict={x: mnist.test.images, 
                                                  y: mnist.test.labels})
    print("Test Accuracy: %.2f"%test_accuracy)


Test Accuracy: 0.93